We investigate the capability of localizing node failures in communicationnetworks from binary states (normal/failed) of end-to-end paths. Given a set ofnodes of interest, uniquely localizing failures within this set requires thatdifferent observable path states associate with different node failure events.However, this condition is difficult to test on large networks due to the needto enumerate all possible node failures. Our first contribution is a set ofsufficient/necessary conditions for identifying a bounded number of failureswithin an arbitrary node set that can be tested in polynomial time. In additionto network topology and locations of monitors, our conditions also incorporateconstraints imposed by the probing mechanism used. We consider three probingmechanisms that differ according to whether measurement paths are (i)arbitrarily controllable, (ii) controllable but cycle-free, or (iii)uncontrollable (determined by the default routing protocol). Our secondcontribution is to quantify the capability of failure localization through (1)the maximum number of failures (anywhere in the network) such that failureswithin a given node set can be uniquely localized, and (2) the largest node setwithin which failures can be uniquely localized under a given bound on thetotal number of failures. Both measures in (1-2) can be converted intofunctions of a per-node property, which can be computed efficiently based onthe above sufficient/necessary conditions. We demonstrate how measures (1-2)proposed for quantifying failure localization capability can be used toevaluate the impact of various parameters, including topology, number ofmonitors, and probing mechanisms.
展开▼